Memory-Based Morphological Analysis Generation and Part-of-Speech Tagging of Arabic

نویسندگان

  • Erwin Marsi
  • Antal van den Bosch
  • Abdelhadi Soudi
چکیده

We explore the application of memorybased learning to morphological analysis and part-of-speech tagging of written Arabic, based on data from the Arabic Treebank. Morphological analysis – the construction of all possible analyses of isolated unvoweled wordforms – is performed as a letter-by-letter operation prediction task, where the operation encodes segmentation, part-of-speech, character changes, and vocalization. Part-of-speech tagging is carried out by a bi-modular tagger that has a subtagger for known words and one for unknown words. We report on the performance of the morphological analyzer and part-of-speech tagger. We observe that the tagger, which has an accuracy of 91.9% on new data, can be used to select the appropriate morphological analysis of words in context at a precision of 64.0 and a recall of 89.7.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Memory-based morphological analysis and part-of-speech tagging of Arabic

Memory-based learning has been successfully applied to morphological analysis and part-ofspeech tagging in Western and Eastern-European languages (Daelemans et al., 1996; Van den Bosch and Daelemans, 1999; Zavrel and Daelemans, 1999). With the release of the Arabic Treebank by the Linguistic Data Consortium, a large corpus has become available for Arabic that can act as training material for ma...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

ACL - 05 Computational Approaches to Semitic Languages

We explore the application of memorybased learning to morphological analysis and part-of-speech tagging of written Arabic, based on data from the Arabic Treebank. Morphological analysis – the construction of all possible analyses of isolated unvoweled wordforms – is performed as a letter-by-letter operation prediction task, where the operation encodes segmentation, part-of-speech, character cha...

متن کامل

Part-of-Speech Tagging of Dutch with MBT, a Memory-Based Tagger Generator

We present a part of speech tagger (morphosyntactic disambiguator) for Dutch, constructed by means of the Memory-Based Tagger generation method. In this approach, inductive learning methods are used to derive a tagger, lexicon and unknown word category guesser fully automatically from a tagged example corpus. Advantages of the approach are (i) fast tagger development time without linguistic eng...

متن کامل

Joint Arabic Segmentation and Part-Of-Speech Tagging

Arabic has a very complex morphological system, though a very structured one. Character patterns are often indicative of word class and word segmentation. In this paper, we e xplore a novel approach to Arabic word segmentation and part-of-speech tagging relying on character information. The approach is lexicon-free and does not require any morphological analysis, eliminat ing the factor of dict...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005